refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

roedoejet · 2024-10-30T01:18:17Z

semanticdiff-com · 2024-10-30T01:18:19Z

Review changes with

Changed Files

File	Status
fs2/prediction_writing_callback.py	17% smaller
fs2/cli/synthesize.py	16% smaller
fs2/model.py	0% smaller

joanise

This looks good, modulo some questions in the comments below.

joanise · 2024-12-09T22:16:58Z

fs2/prediction_writing_callback.py

-        assert "output" in outputs and outputs["output"] is not None
-        assert wavs.shape[0] == outputs["output"].size(
+            wavs.ndim == 3
+        ), f"The generated audio did not contain 3 dimensions. First dimension should be B(atch) and the second dimension should be C(hannels) and third dimension should be T(ime) in samples. Got {wavs.shape} instead."


Can this happen due to a user error (like providing the wrong kind of input file), or is this strictly due to a programmer error? If the latter, OK, if the former, I don't like using assert.

joanise · 2024-12-09T22:20:39Z

fs2/prediction_writing_callback.py

-                    basename=basename,
-                    speaker=speaker,
-                    language=language,
+            torchaudio.save(


Is the change of audio writer function related to this PR, or just an unrelated improvement? I assume you've tested and you can confirm this works well?

joanise · 2024-12-09T22:26:38Z

fs2/prediction_writing_callback.py

+                data[:unmasked_len]
+                .cpu()
+                .transpose(0, 1),  # save tensors as [K (bands), T (frames)]
+                str(


We didn't use to need to cast this Path to a str, I wonder why you do now. In my PR #102, get_filename is factored out to the base class, we should have it do return str(path) in one place, in get_filename, instead of casting everywhere we use it.
Warning: Whichever PR is merged second will have to rebase and resolve conflicts over the use of get_filename.

rebased and fixed using my suggestion here

…f time-oriented ones

…aining too

codecov · 2024-12-10T17:56:55Z

Codecov Report

Attention: Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.

Project coverage is 46.13%. Comparing base (2afc610) to head (9549082).

Files with missing lines	Patch %	Lines
fs2/model.py	0.00%	2 Missing ⚠️
fs2/cli/synthesize.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
- Coverage   46.24%   46.13%   -0.12%     
==========================================
  Files          22       22              
  Lines        1464     1461       -3     
==========================================
- Hits          677      674       -3     
  Misses        787      787

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

roedoejet mentioned this pull request Oct 30, 2024

consolidate spectrogram dimensions EveryVoiceTTS/EveryVoice#572

Open

joanise approved these changes Dec 9, 2024

View reviewed changes

roedoejet added 3 commits December 10, 2024 12:26

refactor!: change model to output mel-band oriented tensors instead o…

5b5abe0

…f time-oriented ones

fix: write strings not paths using torch

229645c

fix: the vocoder expects [B, K, T] tensors and this applies during tr…

9549082

…aining too

joanise force-pushed the dev.ap/513 branch from 47b94e9 to 9549082 Compare December 10, 2024 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

roedoejet commented Oct 30, 2024 •

edited

Loading

semanticdiff-com bot commented Oct 30, 2024 •

edited

Loading

joanise left a comment

joanise Dec 9, 2024

joanise Dec 9, 2024

joanise Dec 9, 2024

joanise Dec 10, 2024

codecov bot commented Dec 10, 2024

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

Are you sure you want to change the base?

refactor!: change model to output mel-band oriented tensors instead of time-oriented ones #94

Conversation

roedoejet commented Oct 30, 2024 • edited Loading

semanticdiff-com bot commented Oct 30, 2024 • edited Loading

joanise left a comment

Choose a reason for hiding this comment

joanise Dec 9, 2024

Choose a reason for hiding this comment

joanise Dec 9, 2024

Choose a reason for hiding this comment

joanise Dec 9, 2024

Choose a reason for hiding this comment

joanise Dec 10, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 10, 2024

Codecov Report

roedoejet commented Oct 30, 2024 •

edited

Loading

semanticdiff-com bot commented Oct 30, 2024 •

edited

Loading